A Relevance - Based Language Modeling approach to DUC 2005 ∗

نویسنده

  • Vasudeva Varma
چکیده

The task in Document Understanding Conferences (DUC) 2005 is to generate fixed length, user oriented, multi document summary. Our approach to address this task is primarily motivated by the observation that metrics based on key concepts overlap give better results when compared to metrics based on n-gram and sentence overlap. In this paper, we present a sentence extraction based summarization system which scores the sentences using Relevance Based Language Modeling, Latent Semantic Indexing and number of special words. From these scored sentences, the system generates a summary of required granularity. Our summarization system was ranked 3,4,8 and 17 in ROUGE-SU4, ROUGE-2, responsiveness and linguistic quality evaluations respectively. In post DUC analysis we found that LSI has negative effect on the systems performance, and the performance gained by 5.4% when it is implemented using language modeling and number of special words.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Embra System at DUC 2005: Query-oriented Multi-document Summarization with a Very Large Latent Semantic Space

We present the Embra system, a first-time entry to DUC for 2005 which performed at or above median for the manual assessment of responsiveness and on 4 out of 5 linguistic quality questions. The system takes a novel approach to relevance and redundancy, modeling sentence similarity using a latent semantic space constructed over a very large corpus. We present a simple approach to modeling speci...

متن کامل

Advertising Keyword Suggestion Using Relevance-Based Language Models from Wikipedia Rich Articles

When emerging technologies such as Search Engine Marketing (SEM) face tasks that require human level intelligence, it is inevitable to use the knowledge repositories to endow the machine with the breadth of knowledge available to humans. Keyword suggestion for search engine advertising is an important problem for sponsored search and SEM that requires a goldmine repository of knowledge. A recen...

متن کامل

Structured queries, language modeling, and relevance modeling in cross-language information retrieval

Two probabilistic approaches to cross-lingual retrieval are in wide use today, those based on probabilistic models of relevance, as exemplified by INQUERY, and those based on language modeling. INQUERY, as a query net model, allows the easy incorporation of query operators, including a synonym operator, which has proven to be extremely useful in cross-language information retrieval (CLIR), in a...

متن کامل

Modeling and Non-modeling Genre-based Approach to Writing Argument-led Introduction Paragraphs: A Case of English Students in Iran

Despite the crucial role of introductory sections in argumentative academic writing, the effects of genre- based approaches to writing introductory paragraphs have not been much explored yet. The present study aimed to investigate whether the provision of genre knowledge through modeling and non-modeling could enhance learners’ ability in writing introductory paragraphs of argumentative essays....

متن کامل

The Hong Kong Polytechnic University at DUC2005

This paper discusses the query-based multidocument summarization techniques implemented by the Hong Kong Polytechnic University at DUC 2005. The summarization system is built under the framework of MEAD. In addition to borrow the features provided by MEAD for text summarization, including centroid and sentence length etc., we also introduce the entity-based, pattern-based, termbased and semanti...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005